{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## 支持向量机  （support vector machines, SVM）\n",
    "\n",
    "> 支持向量机（support vector machines，SVM）是一种二分类模型，它将实例的特征向量映射为空间中的一些点，SVM 的目的就是想要画出一条线，以 “最好地” 区分这两类点，以至如果以后有了新的点，这条线也能做出很好的分类。SVM 适合中小型数据样本、非线性、高维的分类问题\n",
    "\n",
    "SVM学习的基本想法是\n",
    "  求解能够正确划分训练数据集并且几何间隔最大的分离超平面\n",
    "\n",
    "对于线性可分的数据集来说，这样的超平面有无穷多个（即感知机），但是几何间隔最大的分离超平面却是唯一的。\n",
    "\n",
    "Advantages 优势：\n",
    " * Effective in high dimensional spaces. 在高维空间中有效。\n",
    " * Still effective in cases where number of dimensions is greater than the number of samples. 在尺寸数大于样本数的情况下仍然有效。\n",
    " * Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.\n",
    "   在决策函数中使用训练点的子集(称为支持向量)，因此它也具有记忆效率。\n",
    " * Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.\n",
    "   通用:可以为决策函数指定不同的核函数。提供了通用内核，但也可以指定自定义内核。\n",
    "\n",
    "disadvantages 缺点:\n",
    "\n",
    " * If the number of features is much greater than the number of samples, avoid over-fitting in choosing Kernel functions and regularization term is crucial.\n",
    "   当特征个数远大于样本个数时，在选择核函数和正则化项时应避免过拟合。\n",
    "\n",
    " * SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation (see Scores and probabilities, below).\n",
    "   支持向量机不直接提供概率估计，这些是使用昂贵的五倍交叉验证计算的(见下面的分数和概率)。\n",
    "\n",
    "\n",
    "[支持向量机](https://blog.csdn.net/qq_31347869/article/details/88071930)\n",
    "[sklearn文档-svm](https://scikit-learn.org/dev/modules/svm.html#svm)"
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "The sklearn.svm module includes Support Vector Machine algorithms.\n",
    "\n",
    "|Estimators | description |\n",
    "|:---- |:----  |\n",
    "| svm.LinearSVC([penalty, loss, dual, tol, C, …]) | Linear Support Vector Classification.   |\n",
    "| svm.LinearSVR(*[, epsilon, tol, C, loss, …])    | Linear Support Vector Regression.       |\n",
    "| svm.NuSVC(*[, nu, kernel, degree, gamma, …])    | Nu-Support Vector Classification.       |\n",
    "| svm.NuSVR(*[, nu, C, kernel, degree, gamma, …]) | Nu Support Vector Regression.           |\n",
    "| svm.OneClassSVM(*[, kernel, degree, gamma, …])  | Unsupervised Outlier Detection.         |\n",
    "| svm.SVC(*[, C, kernel, degree, gamma, …])       | C-Support Vector Classification.        |\n",
    "| svm.SVR(*[, kernel, degree, gamma, coef0, …])   | Epsilon-Support Vector Regression.      |"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "###  Classification 使用支持向量机做分类任务\n",
    "\n",
    "SVC, NuSVC and LinearSVC are classes capable of performing binary and multi-class classification on a dataset."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "from sklearn.svm import SVC\n",
    "import numpy as np\n",
    "\n",
    "X = np.random.randint(0,10,(50,2))\n",
    "y = (X[:,0] + X[:,1]) // 10\n",
    "clf = SVC()\n",
    "clf.fit(X, y)\n",
    "clf.predict([[2., 7.],[3,9]]) # 结果会不一样"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "print(__doc__)\n",
    "\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn import svm, datasets\n",
    "\n",
    "def make_meshgrid(x, y, h=.02):\n",
    "    \"\"\"Create a mesh of points to plot in\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    x: data to base x-axis meshgrid on\n",
    "    y: data to base y-axis meshgrid on\n",
    "    h: stepsize for meshgrid, optional\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    xx, yy : ndarray\n",
    "    \"\"\"\n",
    "    x_min, x_max = x.min() - 1, x.max() + 1\n",
    "    y_min, y_max = y.min() - 1, y.max() + 1\n",
    "    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),\n",
    "                         np.arange(y_min, y_max, h))\n",
    "    return xx, yy\n",
    "\n",
    "\n",
    "def plot_contours(ax, clf, xx, yy, **params):\n",
    "    \"\"\"Plot the decision boundaries for a classifier.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    ax: matplotlib axes object\n",
    "    clf: a classifier\n",
    "    xx: meshgrid ndarray\n",
    "    yy: meshgrid ndarray\n",
    "    params: dictionary of params to pass to contourf, optional\n",
    "    \"\"\"\n",
    "    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])\n",
    "    Z = Z.reshape(xx.shape)\n",
    "    out = ax.contourf(xx, yy, Z, **params)\n",
    "    return out\n",
    "\n",
    "\n",
    "# import some data to play with\n",
    "iris = datasets.load_iris()\n",
    "# Take the first two features. We could avoid this by using a two-dim dataset\n",
    "X = iris.data[:, :2]\n",
    "y = iris.target\n",
    "\n",
    "# we create an instance of SVM and fit out data. We do not scale our\n",
    "# data since we want to plot the support vectors\n",
    "C = 1.0  # SVM regularization parameter\n",
    "models = (svm.SVC(kernel='linear', C=C),\n",
    "          svm.LinearSVC(C=C, max_iter=10000),\n",
    "          svm.SVC(kernel='rbf', gamma=0.7, C=C),\n",
    "          svm.SVC(kernel='poly', degree=3, gamma='auto', C=C))\n",
    "models = (clf.fit(X, y) for clf in models)\n",
    "\n",
    "# title for the plots\n",
    "titles = ('SVC with linear kernel',\n",
    "          'LinearSVC (linear kernel)',\n",
    "          'SVC with RBF kernel',\n",
    "          'SVC with polynomial (degree 3) kernel')\n",
    "\n",
    "# Set-up 2x2 grid for plotting.\n",
    "fig, sub = plt.subplots(2, 2)\n",
    "plt.subplots_adjust(wspace=0.4, hspace=0.4)\n",
    "\n",
    "X0, X1 = X[:, 0], X[:, 1]\n",
    "xx, yy = make_meshgrid(X0, X1)\n",
    "\n",
    "for clf, title, ax in zip(models, titles, sub.flatten()):\n",
    "    plot_contours(ax, clf, xx, yy,\n",
    "                  cmap=plt.cm.coolwarm, alpha=0.8)\n",
    "    ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=20, edgecolors='k')\n",
    "    ax.set_xlim(xx.min(), xx.max())\n",
    "    ax.set_ylim(yy.min(), yy.max())\n",
    "    ax.set_xlabel('Sepal length')\n",
    "    ax.set_ylabel('Sepal width')\n",
    "    ax.set_xticks(())\n",
    "    ax.set_yticks(())\n",
    "    ax.set_title(title)\n",
    "\n",
    "plt.show()"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Regression\n",
    "There are three different implementations of Support Vector Regression: SVR, NuSVR and LinearSVR. LinearSVR provides a faster implementation than SVR but only considers the linear kernel, while NuSVR implements a slightly different formulation than SVR and LinearSVR. See Implementation details for further details."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "from sklearn.svm import SVR\n",
    "X = [[0, 0], [2, 2]]\n",
    "y = [0.5, 2.5]\n",
    "regr = SVR()\n",
    "regr.fit(X, y)\n",
    "regr.predict([[1, 1]])"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Unsupervised Outlier Detection.\n",
    "\n",
    "Estimate the support of a high-dimensional distribution.\n",
    "\n",
    "OneClassSVM is based on libsvm."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "from sklearn.svm import OneClassSVM\n",
    "X = [[0], [0.44], [0.45], [0.46], [1]]\n",
    "clf = OneClassSVM(gamma='auto')\n",
    "clf.fit(X)\n",
    "result = clf.predict(X)\n",
    "print(result)\n",
    "scores = clf.score_samples(X)\n",
    "print(scores)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 使用SVM做异常检测算法\n",
    "\n",
    "Comparing anomaly detection algorithms for outlier detection on toy datasets\n",
    "\n",
    "[refrence](https://scikit-learn.org/dev/auto_examples/miscellaneous/plot_anomaly_comparison.htm)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# Author: Alexandre Gramfort <alexandre.gramfort@inria.fr>\n",
    "#         Albert Thomas <albert.thomas@telecom-paristech.fr>\n",
    "# License: BSD 3 clause\n",
    "\n",
    "import time\n",
    "\n",
    "import numpy as np\n",
    "import matplotlib\n",
    "import matplotlib.pyplot as plt\n",
    "import sklearn\n",
    "print(sklearn.__version__)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "from sklearn.datasets import make_moons, make_blobs\n",
    "from sklearn.covariance import EllipticEnvelope\n",
    "from sklearn.ensemble import IsolationForest\n",
    "from sklearn.neighbors import LocalOutlierFactor\n",
    "from sklearn.svm import OneClassSVM\n",
    "from sklearn.kernel_approximation import Nystroem\n",
    "from sklearn.pipeline import make_pipeline\n",
    "from sklearn.linear_model import SGDOneClassSVM\n",
    "\n",
    "print(__doc__)\n",
    "\n",
    "matplotlib.rcParams['contour.negative_linestyle'] = 'solid'\n",
    "\n",
    "# Example settings\n",
    "n_samples = 300\n",
    "outliers_fraction = 0.15\n",
    "n_outliers = int(outliers_fraction * n_samples)\n",
    "n_inliers = n_samples - n_outliers\n",
    "\n",
    "# define outlier/anomaly detection methods to be compared.\n",
    "# the SGDOneClassSVM must be used in a pipeline with a kernel approximation\n",
    "# to give similar results to the OneClassSVM\n",
    "anomaly_algorithms = [\n",
    "    (\"Robust covariance\", EllipticEnvelope(contamination=outliers_fraction)),\n",
    "    (\"One-Class SVM\",  OneClassSVM(nu=outliers_fraction, kernel=\"rbf\",\n",
    "                                      gamma=0.1)),\n",
    "    (\"One-Class SVM (SGD)\", make_pipeline(\n",
    "        Nystroem(gamma=0.1, random_state=42, n_components=150),\n",
    "        SGDOneClassSVM(nu=outliers_fraction, shuffle=True,\n",
    "                       fit_intercept=True, random_state=42, tol=1e-6)\n",
    "    )),\n",
    "    (\"Isolation Forest\", IsolationForest(contamination=outliers_fraction,\n",
    "                                         random_state=42)),\n",
    "    (\"Local Outlier Factor\", LocalOutlierFactor(\n",
    "        n_neighbors=35, contamination=outliers_fraction))]\n",
    "\n",
    "# Define datasets\n",
    "blobs_params = dict(random_state=0, n_samples=n_inliers, n_features=2)\n",
    "datasets = [\n",
    "    make_blobs(centers=[[0, 0], [0, 0]], cluster_std=0.5,\n",
    "               **blobs_params)[0],\n",
    "    make_blobs(centers=[[2, 2], [-2, -2]], cluster_std=[0.5, 0.5],\n",
    "               **blobs_params)[0],\n",
    "    make_blobs(centers=[[2, 2], [-2, -2]], cluster_std=[1.5, .3],\n",
    "               **blobs_params)[0],\n",
    "    4. * (make_moons(n_samples=n_samples, noise=.05, random_state=0)[0] -\n",
    "          np.array([0.5, 0.25])),\n",
    "    14. * (np.random.RandomState(42).rand(n_samples, 2) - 0.5)]\n",
    "\n",
    "# Compare given classifiers under given settings\n",
    "xx, yy = np.meshgrid(np.linspace(-7, 7, 150),\n",
    "                     np.linspace(-7, 7, 150))\n",
    "\n",
    "plt.figure(figsize=(len(anomaly_algorithms) * 2 + 4, 12.5))\n",
    "plt.subplots_adjust(left=.02, right=.98, bottom=.001, top=.96, wspace=.05,\n",
    "                    hspace=.01)\n",
    "\n",
    "plot_num = 1\n",
    "rng = np.random.RandomState(42)\n",
    "\n",
    "for i_dataset, X in enumerate(datasets):\n",
    "    # Add outliers\n",
    "    X = np.concatenate([X, rng.uniform(low=-6, high=6, size=(n_outliers, 2))],\n",
    "                       axis=0)\n",
    "\n",
    "    for name, algorithm in anomaly_algorithms:\n",
    "        t0 = time.time()\n",
    "        algorithm.fit(X)\n",
    "        t1 = time.time()\n",
    "        plt.subplot(len(datasets), len(anomaly_algorithms), plot_num)\n",
    "        if i_dataset == 0:\n",
    "            plt.title(name, size=18)\n",
    "\n",
    "        # fit the data and tag outliers\n",
    "        if name == \"Local Outlier Factor\":\n",
    "            y_pred = algorithm.fit_predict(X)\n",
    "        else:\n",
    "            y_pred = algorithm.fit(X).predict(X)\n",
    "\n",
    "        # plot the levels lines and the points\n",
    "        if name != \"Local Outlier Factor\":  # LOF does not implement predict\n",
    "            Z = algorithm.predict(np.c_[xx.ravel(), yy.ravel()])\n",
    "            Z = Z.reshape(xx.shape)\n",
    "            plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='black')\n",
    "\n",
    "        colors = np.array(['#377eb8', '#ff7f00'])\n",
    "        plt.scatter(X[:, 0], X[:, 1], s=10, color=colors[(y_pred + 1) // 2])\n",
    "\n",
    "        plt.xlim(-7, 7)\n",
    "        plt.ylim(-7, 7)\n",
    "        plt.xticks(())\n",
    "        plt.yticks(())\n",
    "        plt.text(.99, .01, ('%.2fs' % (t1 - t0)).lstrip('0'),\n",
    "                 transform=plt.gca().transAxes, size=15,\n",
    "                 horizontalalignment='right')\n",
    "        plot_num += 1\n",
    "\n",
    "plt.show()"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}